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Abstract 

This study critically analyses the information-theoretic, axiomatic and combinatorial philosoph- 
ical bases of the entropy and cross-entropy concepts. The combinatorial basis is shown to be 
the most fundamental (most primitive) of these three bases, since it gives (i) a derivation for the 
Kullback-Leibler cross-entropy and Shannon entropy functions, as simplified forms of the multino- 
mial distribution subject to the Stirling approximation; (ii) an explanation for the need to maximize 
entropy (or minimize cross-entropy) to find the most probable realization of a system; and (iii) 
new, generalized definitions of entropy and cross-entropy - supersets of the Boltzmann principle 
- applicable to non-multinomial systems. The combinatorial basis is therefore of much broader 
scope, with far greater power of application, than the information-theoretic and axiomatic bases. 
The generalized definitions underpin a new discipline of 11 combinatorial information theory", for 
the analysis of probabilistic systems of any type. 

Jaynes' generic formulation of statistical mechanics for multinomial systems is re-examined in 
light of the combinatorial approach, including the analysis of probability distributions, ensem- 
ble theory, Jaynes relations, fluctuation theory and the entropy concentration theorem. Several 
new concepts are outlined, including a generalized Clausius inequality, a generalized free energy 
("free information") function, and a generalized Gibbs-Duhem relation and phase rule. For non- 
multinomial systems, the generalized approach provides a different framework for the reinterpre- 
tation of the many alternative entropy measures (e.g. Bose-Einstein, Fermi-Dirac, Renyi, Tsallis, 
Sharma-Mittal, Beck-Cohen, Kaniadakis) in terms of their combinatorial structure. A connection 
between the combinatorial and Bayesian approaches is also explored. 

PACS numbers: 02.50.Cw, 02.50.Tt, 05.20.-y, 05.40.-a, 05.70.-a, 05.70.Ce, 05.90.+m, 89.20.-a, 89.70,+c, 
Keywords: entropy, cross-entropy, directed divergence, probability, information theory, bits, axiomatic, 
combinatorial, Boltzmann principle, thermodynamics, statistical mechanics, free energy, Jaynes, maximum 
entropy. 
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I. INTRODUCTION 



The concept of entropy - a measure of the lack of order of a system - is one of the 
most profound human discoveries, with implications for virtually all disciplines of human 
study The thermodynamic entropy was first defined by Clansius Q in tenns of the exact 
differential dS, given by the quantity of heat transferred reversibly to a system, dQ, scaled 
by the absolute temperature T of the system: 

dS=f (1) 

Consideration of irreversible (non-equilibrium) processes - as expressed by the second law 



of thermodynamics - gives the Clausius 



2| inequality: 

dS > f (2) 

The combinatorial basis of entropy was revealed by Boltzmann j^] and Planck 0, [s| in the 
famous equation: 

S N = NS = HnW (3) 

where Sjv is the total thermodynamic entropy of the system, N is the number of entities 
(discrete particles or agents) present, S is the thermodynamic entropy per entity, W is num- 
ber of ways in which a specified realization 1 of a system can occur, known as its statistical 
weight, and k is the Boltzmann constant (1.38 x 10 23 J K -1 entity -1 ). Whilst the combinato- 
rial definition is well-known in physics, most thermodynamicists, information theorists and 
mathematicians use the information entropy of Shannon [6] (or a simple multiple thereof, 
as given by Boltzmann Q]): 

where pi is the probability of occurrence of the zth distinguishable outcome or state, from s 
such states, and p = {p%} is the probability distribution (probability mass function). In the 
maximum entropy principle ("MaxEnt") developed by Jaynes [7|, (jlj) is maximized subject 



1 Here the state refers to each different category (e.g. boxes, energy levels, elements or results) accessible 
to a system; a configuration is a distinguishable permutution or pattern of entities amongst the states 
of a system (complexion, microstate, sequence); and a realization is an externally identifiable set of such 
configurations, grouped in a specified manner (macrostate, outcome, type). 
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to the constraints on a system, giving a probability distribution p* which is considered 
to be the most uncertain, the least informative or the least committed to information not 
given, and is therefore used to represent the system 



10 



11 



been shown that the central tenets of thermodynamics can be derived directly (and more 
naturally) using the MaxEnt principle without recourse to any other laws Q, Q, Ql- The 



3, 3, 14, Q- It has 
tly (and 

Jag. 



Shannon entropy is itself a subset o 
or relative entropy 2 function 



10 



14 



;he Kullback-Leibler directed divergence, cross- entropy 
H, Q , in discrete form: 



16 



D(p\q) = yVln — 
^ q% 



(5) 



8=1 



where qi and pi are respectively the prior and posterior probability of occurrence of the ith 
result, p = {pi}, q = {qi} and the solidus | is the Bayesian "subject to". In the minimum 
cross-entropy principle ( "MinXEnt" ) - a superset of MaxEnt - ([5j) is minimized subject to 
the constraints on a system, to give the distribution p* which contains the least informa- 
tion, yet is closest to q [141 ] . Collectively, definitions ©-(EJ) underpin present-day statistical 
physics, thermodynamics, information theory and encoding, optimization and data analysis, 
whilst the MaxEnt and MinXEnt methods have found widespread application to a vast as- 
sortment of fields, including information technology, communications, mathematics, science, 
engineering, economics, decision theory, geography, linguistics and the social sciences [e.g. 



9. 



13 



20]. 



Attention must, however, be directed towards two points. Firstly, although the MaxEnt 
and MinXEnt methods are widely considered to fall within the scope of Bayesian inferential 



reasoning [e.g. 



21 



221 ]. no such Bayesian derivation appears to be evident in the literature, 



and the two methods remain largely underpinned by axiomatic arguments [23|, |24J. Since 
these arguments can be varied and are subject to debate, the circumstances in which the 
MaxEnt and MinXEnt methods remain valid, or require modification, are not clear. Sec- 
ondly, there is now widespread controversy in many of the above-mentioned fields - especially 
in statistical physics - due to the promulgation of a variety of alternative entropy functions, 
inconsistent with the above definitions, for specific applications. These include the Fisher 



information measure 



chanics 271, 



28 



29 



25, 



30 



31 



26 k the Bose-Einstein and Fermi-Dirac entropies of quantum me- 
32, Q; the Renyi 34], Tsallis 35, 36], Sharma-Mittal 37, 38] 



2 The relative entropy is usually denned as the negative of (JSJ); a few authors define the cross-entropy 
differently. 
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and other entropies of non-extensive (correlated) statistics; Beck-Cohen superstatistics 
;he Kaniadakis entropy of relativit y t heor y [iol 4l | ; the "exact" entropies of the author 

20, 44, 45, M 4^48, 49, 50 



42 



43], and many others [e.g. 



511 ] . In recent years, there 



has been a tremendous surge of interest in such alternative measures; e.g. the Tsallis (non- 
extensive) literature alone contains over 1000 refereed journal articles since 1988, with 141 
in 2005. Despite this high level of activity, the fundamental meaning of such alternative 
entropy functions, and how they fit into the combinatorial scheme of Boltzmann, is still not 
well understood. 

This discussion highlights the fact that the entropy concept has many different philo- 
sophical bases. In addition to (i) the combinatorial basis of entropy, other bases include: 



(ii) The information-theoretic basis |6j, [52|, |53j, [54|, [55|, |56|, |57J, |58(, in which entropy is 
defined in terms of the number of bits of information needed to describe a particular 
system, and/or in terms of coding theory; 

n 

(iii) The axiomatic basis |6j, in which the desired properties of an entropy measure - its 
axioms - are listed and used for its derivation; 



(iv) The inverse modelling approach of Kapur and Kesavan [13|, [14], |59|, |60j, 1611. 162 1. in which 
one works backwards from an observed probability distribution p*, a priori distribution 
q (if available) and any constraints, to derive the measure of cross-entropy or entropy 
applicable to a system; 



(v) The game-theoretic basis 63j, [64], [65|, |66j, |67J, [68|, |69] , in which an entropy function is 
derived by analysis of a game between two or more players; and 



(vi) The information- geometric or statistical manifold basis 



70 



71 



72 



731 ]. in which an 



information measure is analysed using a geometric representation. 

Bases (ii) and (iii) are popular in information theory, (v) in economics, business and military 
strategy, (vi) in statistics, probability theory and mathematics, whilst (iv) is less well known. 
Whilst each basis has a following in its own discipline, the relationships between the different 
bases are still largely unexplored. Furthermore, whether any basis can claim supremacy over 
the other bases, or whether they are of equal philosophical standing, is a question which has 
not been adequately addressed. 
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The aim of this article - which follows two previous studies |42|, |43j - is to critically examine 
the philosophical bases of the entropy and cross-entropy concepts, with particular attention 
to the information-theoretic, axiomatic and combinatorial interpretations. Using the com- 
binatorial basis, it is shown (following a well-trodden road) that both the cross-entropy and 
entropy functions are simplified forms of the logarithm of the multinomial distribution; they 
are therefore only shorthand functions to determine the "most probable" (MaxEnt or MinX- 
Ent) realization of a system which follows the multinomial distribution, without the necessity 
of invoking this distribution itself. The Kullback-Leibler cross-entropy and Shannon entropy 
functions are therefore secondary concepts, based firmly on simple combinatorial principles. 
This perspective lies in stark contrast to the axiomatic and information-theoretic bases, 
both of which take the cross-entropy or (especially) the entropy function as the fundamental 
concept and starting point for analysis. Since it rests upon a more definitive philosophical 
foundation, the combinatorial basis is the most fundamental (most primitive) of these three 
bases. It is also of much broader scope, leading naturally to new, generalized combinatorial 
definitions of cross-entropy and entropy - each a superset of the Boltzmann principle ([I]) - 
for the analysis of any probabilistic system, irrespective of whether it is governed by the 
multinomial distribution. (Such definitions permit the reinterpretation of the many alter- 
native cross-entropy and entropy measures - e.g. Bose-Einstein, Fermi-Dirac, Renyi, Tsallis, 
Sharma-Mittal, Beck-Cohen, Kaniadakis, etc - in light of their combinatorial structure.) 
The revised definitions underpin the development of a new, broad discipline of combinato- 
rial information theory, spanning the entirety of present-day statistical physics, information 
theory and probability theory, for the analysis of probabilistic systems of every type. 

After early drafts of this work were completed \7m, th e author's attention was alerted to 



several works of Grendar and Grendar 



75 



76 



77, 



7S 



who adopt a similar philosophical 



argument, albeit with somewhat different aims and a different scope. In fact, the central 

n 

premise of this study has been known since the time of Boltzmann |3J , played a critical role in 



2a 



29 



ad, lai, and 



the discovery of Bose-Einstein and Fermi-Dirac (quantum) statistics [21 
to some extent provides a motivation for present-day large deviations theory [e.g. |80|, l8l| . 
but for some reason has not been developed to its logical conclusion, viz. into generalized 
combinatorial definitions of entropy and cross-entropy. The study therefore encompasses 
and expands upon_the combinatorial arguments used in classical and quantum statistical 
mechanics [e.g 



y, q, y, 



32 



33 



82 



83 



84 



85 



86 



87 



88 



89J. Such arguments tend to be 
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examined only in 



exceptions [e.g. 



passing 



10 



11 



15 



18 



most information theorists, although there are some notable 



90 



9l|. 



This work is organised as follows. In §11 AHIIC| the main elements of the information- 
theoretic, axiomatic and combinatorial bases of entropy and cross-entropy are critically 
examined, leading to combinatorial derivations of the Shannon and Kullback-Leibler mea- 
sures, which reveal their purpose to determine the "most probable" (modal) probability 
distribution of a multinomial system. Several technical aspects are then scrutinized in de- 
tail: zero reference states for entropy or information; ensemble theory and multicomponent 
systems; and the "generic" formulation of statistical mechanics developed by Jaynes [e.g. 



J stems 
y, y, 



10 



11 



12, m 



14 



15] . The latter is reinterpreted and extended for a multinomial 



system in light of the combinatorial approach, with the derivation of new concepts including 
a generalized Clausius inequality, a generalized free energy ( "free information" ) function, a 
generalized Gibbs-Duhem relation and phase rule, and a reappraisal of fluctuation theory 
and Jaynes' entropy concentration theorem. In §III[ the significance of the multinomial 
distribution is reviewed, leading to the proposition of generalized definitions of entropy and 
cross-entropy for non-multinomial systems. A connection to Bayesian statistical inference, 
and the other bases of entropy, are also discussed. 

In the following, an entity is taken to be any discrete particle, object or agent within a 
system, which acts separately but not necessarily independently of the other entities present 
(note this definition encompasses human beings). The entity therefore constitutes the unit 
of analysis of the system, although of course some entities can be further examined in terms 
of their constituent sub-entities, if desired. 



II. THEORETICAL ROOTS OF THE INFORMATION ENTROPY CONCEPT 

What is entropy? This question has certainly occupied (or been dismissed from) the minds 
of millions of college and university students for one and a half centuries - predominantly in 
physics, chemistry, engineering and informatics - and undoubtedly tens of thousands more 
of their professional elders in all disciplines. To endeavour to answer this question, in this 
section the first three theoretical or philosophical roots of the entropy and cross-entropy 
concepts listed in §Uare examined. The first two, information-theoretic and axiomatic, are 
so closely intertwined in the literature that it is not possible to distinguish them clearly. The 
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third origin, based on combinatorial analysis, is somewhat distinct, and occupies much of this 
work. Discussion of the remaining three bases of entropy (inverse modelling, game-theoretic 
and information-geometric) is postponed until later in the text ( §111 Cp . A rival approach 



to t 



ie analysis of probabilistic systems, which invokes the continuous Fisher information 



26 



941 ]. is examined in detail elsewhere 951 ] . 



A. The Information-Theoretic (Bits) Approach 

The first theoretical basis of the Shannon entropy - although not the first in historical 
development - concerns the number of bits of information required to specify a particular 



system or outcome [6J, [52, |53j 



54 



55 



56 



57 



58| . Consider the binary entropy or 5-entropy: 

s 

B = - S ^p l \og 2 p i (6) 

related to the Shannon entropy (defined using the natural logarithm, (J3J)) by H = I? In 2. 
Now consider a random variable which may take one of two states, of equal probability 
Pi = \,i — 1,2. Initially, the state of the variable is not known. After a binary decision (a 
process of selection or measurement) it is found to be in one of these states (say p\ = 1) and 
not the other (p 2 = 0). The initial and final binary entropies are therefore: 



B 



init 



-2(|io g2 1: 



B 



final 



;iiog 2 i + oio g2 o) = 



(7) 



(Here and subsequently, we take OlogO = logO = logl = for all logarithmic bases). The 
change in entropy is then: 

AB = B fi na i — Bi n it = — 1 (8) 



If we define the change in in 
lost = information gained) 



ormation as t 



53, 



54 



55 



ie negative of the change in entropy (i.e., entropy 
83l . l96l . [97J], the gain in information - reflecting our 



improved state of knowledge - is: 



AI 



-AB = 1 



(9) 



Thus for a simple binary decision, the information gained (entropy lost) corresponds to one 
bit of information. The decrease in entropy therefore provides a quantitative measure of the 
information gained by observation of a system. 

If we adopt a scaled binary entropy Sb = —k Yli=i Pi 1°§2 Ph the information gained by a 
binary decision is k, measured in the units of k. For a scaled entropy based on the natural 



S 



logarithm, S = —k Y2i=i Pi m P«; the gain in information is k In 2 (g, |52j. For thermodynamic 
systems for which k is the Boltzmann constant, 1 bit of information corresponds to an 
energy transfer of 9.57 x 10~ 24 J K _1 entity -1 . To access information carried by photons, 
and distinguish them from the background (thermal) radiation, it is necessary to account 
for the effect of temperature 5J, |55j; in this case, 1 bit of information corresponds to kT In 2 
energy units per entity. 

A second variant of the information-theoretic definition - which overlaps with the ax- 
iomatic approach ( §11 Bj) - is to consider a random variable which may take s equally probable 
states. We define a measure of uncertainty as 0, Q]: 



U = Ins 



(10) 



As the states are equally probable, s = 1/pi, Vi, hence U = — \npi. The mathematical 
expectation of the uncertainty is (U) = — Y^i=iPi^ n Pi = the Shannon entropy. As 

the states are equally probable, this reduces to (U) = U. 

For states which are not equally probable, we may thus adopt the Shannon entropy as 
a measure of the expectation of the uncertainty 6L We can further define the surprisal or 



self-information associated with each result 



— lnp 



45|: 



(11) 



The entropy is therefore the expectation of the surprisal. 



The surprisal has also been defined relative to the prior probability of that result, & 
ln(pj/gj), i.e. as the amount of information gained by a decision or message 0, Q, 45]. 
This is better referred to as the cross-surprisal. The expectation of the cross-surprisal gives 
the cross-entropy (151). The cross-entropy is therefore a measure of the expected information 
relative to what is known. Another useful term is the function Hi = —pi lnpj, here termed the 
weighted surprisal or pa rtial entropy, which when summed over all states gives the Shannon 



entropy [c.f. |57J, |58|, |99j, llOOj. The analogous function Di = Piln(pi/qi) can be termed the 



weighted cross-surprisal or partial cross- entropy. 

The third and strongest variant of the information-theoretic approach relates to informa- 

n 

tion coding [e.g. |58|], in which an alphabet A = {a«} with known or inferred probabilities 
{pi} is mapped to a binary code 3 , with corresponding codeword lengths {fi^}, K{ G N, Vz. To 



In general, A can be mapped to a code alphabet K = {ki} of any size [58| |. 
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minimize the mean codeword length, we consider the binary entropy: 

s 

B = min VV/«;. (12) 

Kgall codes ' ' 
t=l 

To obtain an instantaneous or readily decipherable code, it is common practice to seek a 
prefix-free code, in which no codeword is a prefix of any other; the codeword lengths are 
then subject to the Kraft inequality 58]: 

s 

^2~ Kl <l. (13) 
i=i 

Minimization of (fl2l) with respect to Ki subject to ( {TBI by the Lagrangian method (see 
§11 C 2j) . with normalization (Y^t=iPi — 1)> yields a discontinuous binary entropy: 

s 

1=1 

where \x] is the ceiling of x (the smallest integer greater than or equal to x), which arises 
since must be an integer. By repeated m-fold sampling of (THl) . the two entropies converge: 



B = lim — (15) 

The entropy B therefore indicates the minimum mean (possibly fractional) number of bits 
per symbol, whilst B is the equivalent quantity based on integer codeword lengths. 

The above three information-theoretic roots of the Shannon entropy are of tremendous 
utility, primarily to information theory and coding applications. However, the first two vari- 
ants suffer from the deficiency that they assume that measures of information (or entropy) 
should be of logarithmic form, an assumption in part derived from the axiomatic approach 



( §11 Bj) . Certainly, other functions could yield one bit of information for a binary decision 
(Q. The third variant assumes that the mean code length is the appropriate quantity to 
be minimized; this is reasonable for coding applications, but does not necessarily apply 
to other situations. Furthermore, the Kraft inequality - which gives rise to the logarithm 
in the binary entropy - is not universal in application (e.g. to fixed-length codes, codes 
incorporating redundancy, etc), and warrants further examination. In consequence, the 
information-theoretic definitions of entropy and cross-entropy have a narrow philosophical 
basis, which does not necessarily apply outside their domain of application. 
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B. The Axiomatic Approach 

The second theoretical basis of the entropy concept, developed by Shannon [6], proceeds 
by listing the desired properties of a measure of uncertainty - its axioms or desiderata - 
and finding the mathematical function which satisfies these axioms. Shannon [6] consid- 
ered three axioms: continuity, monotonicity and recursivity (the branching principle), from 
which the Shannon entropy (j3J) is uniquely obtained. To Shannon's original list, many 
additional axioms have been added: e.g. uniqueness, permutational symmetry (invariance) , 
non-negativity, non-impossibility, inclusivity, decisivity, concavity, maximum entropy at uni- 
formity (normality), additivity, strong additivit y, subad ditivity, system independence and 



subset independence [e.g 



- 3, y, 



14 



20 



23 



44. 



3, 



50 



function which satisfies these axioms. Indeed, it may be deducec 
of these axioms, implying that they are not independent [e.g. Il4j . 



lOlj . The Shannon entropy is the only 
rom several small subsets 



21 



47]. 

It must be noted that the definition of thermodynamic entropy ([3]) by Planck [5, §118] 
is derived by an axiomatic argument, assuming multiplicity of the weights an d ad ditivity of 
the entr opy function. Similarly, in the "plausible reasoning" treatises of Cox 1021 . p37] and 



Jaynes 



15 



}2.1], the Shannon entropy (j3J is obtained axiomatically, assuming entropy is 
additive and multiply differentiable. 

The cross-entropy or directed divergence function D can also be obtained using the ax- 



iomatic approach 



14 



16 



13, 



231 ] . Its governing axioms are broadly similar to those for the 
Shannon entropy, except that it is convex, and the equilibrium distribution p* = q in the 
absence of other constraints [ijj]. Both the MaxEnt and MinXEnt principles themselves 



have also been justified axiomatically [e.g. 



23 



24]. 



Whilst mathematically sound and of tremendous utility, the axiomatic approach is in- 
tellectually unsatisfying in that it presents an austere, sterile basis for the entropy and 
cross-entropy functions, based only on abstract notions of desirable properties. The answer 
to the question - what is entropy? - is still not clear. Further, as Kapur 43, p209] notes: 
" mathematicians tried to modify these axioms to get more general measures [of uncertainty] 
including Shannon's measure as a special or limiting case" . Other entropy functions, which 
do not reduce to the Shannon entropy, have also been derived using different sets of ax- 



ioms [e.g. 



2fl 



proposed [e.g. 



34, 
45, 



35 
46 



,136 



44 



47 



50, 



5 II ]. Other measures of divergence have also been 



5 II ]. How can we be certain that the axioms used to derive the 
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Shannon or Kullback-Leibler measures are correct? Indeed, the specification of particular 
axioms may preclude the identification of different or broader measures of entropy, which 
may be more appropriate for particular or more general circumstances. To resolve these 
circular arguments, we now turn to consideration of the combinatorial basis of the entropy 
and cross-entropy functions, which as will be shown, should be recognized as their primary 
(most primitive) philosophical basis. 



C. The Combinatorial (Statistical Mechanical) Approach 



1. Statistics of Multinomial Systems 

The combinatorial approach was first developed in statistical thermodynamics, to ex- 
amine the distribution of molecules amon gst energy levels or phase space elements [e.g. 



32 



33, 



82 



83 



84 



86 



87 



89| | . However, the co mbin atorial basis is only 



touched upon by many prominent statistical mechanics texts [e.g. Il03| in favour of a quan- 
tum mechanical treatment, which tends to disguise its statistical foundation. The connection 
between combinatorial concepts and entropy is not prominent in the information theory lit— 



11 



m 



18 



90 



9l| 



erature, although there are a number of notable exceptions [e.g. [10J, 

Consider the "balls-in-boxes" system illustrated in Figure dh, in which N distinguishable 
balls or entities are distributed amongst s distinguishable boxes or states. This may be taken 
to represent N molecules amongst s energy levels, phase space elements or eigenfunctions 4 ; 
N ensemble members amongst s ensemble energy values; N people amongst s shops; N cars 
amongst s floors of a parking station, and so on. We consider each realization of the system, 
defined to contain ri\ balls in box 1, n 2 balls in box 2, etc, or in general n; balls in box i. 
The N balls are taken to be distinguishable, but their permutations within each box are 
indistinguishable, i.e. we can only (or need only) distinguish the balls within any given box 
from those in the other boxes. Each choice (of a ball in a box) is assumed independent of 
the other selections. The probability of any particular realization of the system, P (equal to 
the probability that there are n, balls in the ith box, for each i), is given by the multinomial 



4 The boxes are here taken to be discrete, although there is no conceptual difficulty in generalizing the 
analysis to boxes of infinitesimal spacing. Similarly, the number of states s is considered finite, but the 
limit s — > oo can be considered if handled carefully [15| . 
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FIG. 1: Multinomial (a) balls-in-boxes and (b) multiple selection systems. 



distribution 



104 



105 



1061 ]: 



P 



(n|q, N,s) 



riiln^.. ..n s \ 



s Tli 

Qi 



(16) 



1=1 



where again is the prior probability of a ball falling in the ith box, and n = {rii}. If the 
prior distribution q is equated to the uniform distribution u (i.e. = u = 1/s, Vi) this 
reduces to: 



\ = P(n|u, N, s) 



M 
fin,! 

i=\ 



-N 



(17) 



Since the total number of configurations of a multinomial distribution is s N 107l |. the 
number of ways in w hich any particular realization in ffT7|) can be produced, or its statistical 
weight, is jlodJlld ]: 



N 



1=1 



(18) 



For constant 7Y, the above equations are subject to the natural constraint: 

s 

CO: J2 n i = N (19) 

i=i 

and usually one or several moment constraints [c.f. Q]: 

s 

CltoCi?: 52nifn = N(f r ), 



r = l,...,R 



(20) 



i=l 
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where f„ is the value of the function f r in the ith state and (f r ) is the mathematical 
expectation of f r i. An example of fl20l) is an energy constraint, in which each state is of 
energy f u = £, and the expectation of the energy is (fi) = (e). 

Now consider a sequence of v independent and identically distributed (i.i.d.) probabilistic 
events, within each of which w trials or selections are made between s distinguishable states, 
as represented in Figure [lb. Examples include tosses of a coin or coins, throws of a die or 
dice, spins of a roulette wheel, choices of symbols to make up a communications signal, or 
the sexual liaisons of leading film star. So long as we are only interested in the statistical 
nature of the selections, and not their order, the probability of any realization or type 
(without regard to order, assuming each event is independent) also follows the multinomial 
distribution f[T6"j) with iV = vw. When only one selection is made in each event (i.e. w = 1), 
then N = v. When the prior probabilities of each state within each selection are identical, 
the weight also follows ({TBI . 



2. The Most Probable Realization 



We now use first combinatorial principles to determine the most probable realization of 



the multinomial systems considered. As mentioned, the 



statistical thermodynamics [e.g. 



32 



33 



82 



83 



ollowin g de rivation is common in 



85 



86 



S3, 



89], although 



such workers base their derivations on the weight W. As it is based on P rather than 
W, the followingderivation incorporates the prior distribution q, and is therefore more 



comprehensive 



9l| 



Clearly, the most probable realization is that for which P (TIB"]) is a maximum, subject to 
the constraints CO-Ci? on the system ( ffl9|) . (120]) ). As the natural logarithm In a; increases 
monotonically with x, but transforms a product into a sum, it is convenient - and equivalent - 
to maximize In P rather than P itself, a convention adopted (implicitly) throughout statistical 
mechanics (3), 0). (The use of logarithms is therefore merely a matter of convenience, not a 
requirement.) The most probable realization is given by: 



d (In P | constraints) = 



(21) 



where d( ) is the total derivative or variational operator. Now (|2T|) can be constructed using 



Lagrange's method of undetermined multipliers 



flfl 



891 ]. involving extremization of the 
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Lagrangian £: 



d£ = 



From the multinomial distribution (TIBl) : 

s 

In P = ^2 (Jfi ln Nl ~ ln n * ! + n i ln ft) 



(22) 



(23) 



i=l 



in which (for reasons which will be explained) the leading IniV! term is brought inside the 
summation using the natural constraint (fT9l . From the constraints ffT^ - (T2"0l) . the Lagrangian 
is: 



s / s 

£ = X)(^]niV!-]nni!+TKln g< ) - (Ao-1) [Y^m- N 

R / a \ 



i=l 



,i=l 



(24) 



r=l \ i=l 

where A r , r = 0, R, are the Lagrangian multipliers, and Ao — 1 is chosen rather than Ao for 
mathematical convenience. For constant N, and (f r ), and for f ri independent of rij, we 
need only consider the variation of (1241) with respect to rij, i.e. <9£/<9nj drii = 0,Vi, whence: 

- In tj.J -I- ln ft; ( An 1 

dn 



1 IniV!- Ai nn .! + i ng ._(A _i)-^A r / ri = 0, z = l,...,s (25) 

r=l 



The above equations are expressed in terms of n«, and can be said to be in "rij form." 

At this stage the near-universal approach taken in the literature (see previous statistical 
mechanics references) is to employ a truncated form of the approximation for factorials 
derived by Stirling [92J and de Moivre [93]: 



hire! ~ x ln x — x 



(26) 



This is accurate to within 1% of lnx! for x > 90; a mo re pr ecise form, lnx! ~ xlnx — x + 
|ln(27rx), is accurate to within 1% of lnx! for x > 4 105]. (Strictly speaking, the limit 
must be taken using upper and lower bounds, as treated in large deviations theory [80, [8l| , 
but for the purpose of the present study, the Stirling approximation yields the same result). 
Thus d In ni\/drii rs Inn; and lnAH rs NlnN — N, and so the most probable realization, 
here designated with an asterisk, is obtained from (1251) in conjunction with CO ffl9l as [c.f. 



20|: 



n*\qi rs Nq { exp ^-A - ^ Kfri^j = ^~ N Qi ex P ^~ ^rfri\ , 



l,...,s 



(27) 
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or 



^-A - ^ Kfrij = eXp ^~ ^rfri\ , 



Pih = ?v ~ ex P | -Ao - > J A r / ri 1 = -^-ft exp ( - y J A r / ri ) , z = 1, s (28) 



with 

Z g = e Xo =J2q ie xpi-yr X r fri ) (29) 
i=l \ r=l / 

where Pi is the proportion or probability of entities in each state i. Equations (12"7"1) - (I28|) 
can be termed the generalized Maxwell-Boltzmann distribution, whilst Z q is the generalized 
partition fu nctio n and A = In Z q is the generalized Massieu function (strictly speaking, its 
negative 9], 111 ]). The Lagrangian multipliers are obtained from the constraints Cr (12"U|) 
and/or more readily from moment calculations (see §11 C 6p . 
If q = u, (1281) reduces to: 



R \ 

/] Kfri j 
r=l / 



Pi\u « - exp I - > ^Ar/ri 1 , / I s 



R 



(30) 



Z = exp ~ Ar ^ r 



8=1 \ r=l 

This is the more commonly reported, generalized Maxwell-Boltzmann distribution of sta- 
tistical thermodynamics and information theory, and Z is the usual generalized partition 
function , ll4j . Eq. ( 1301) is obtained directly if either In P u (11 7h or In W (1181) is used in 
the Lagrangian (j2"ll) instead of In P. 

In the information literature, it is customary to cast the analysis in terms of Pi rather 
than rii, thus in "pj form" 7, s|, Q]. The constraints are: 

s 

CO: $>* = 1 (31) 

i=i 

s 

CltoCi?: = (/r>, r = l,...,R (32) 

i=i 

hence the Lagrangian ff24|) is: 

s 

£ = J2& ln iV! - HiPiNV-} + PiN In 9i ) 

/ s \ /? / s 



i=l 



(33) 



, i=l / r=l \ i=l 
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where /i r , r = 0, R, are the new Lagrangian multipliers, and (/io — N) is used for conve- 
nience. Taking the variation and applying the Stirling approximation gives: 

p*\qi » ^exp (-^ - ^ = ^7* ex P (~ E jV^y ' i = 1 > -•' S ^ 

with 

4 = eM0/ " r = Eft«pf-E^/H > ) (35) 

8=1 \ r=l / 

This is identical to fl28l) - ff29l) . with X r = fi r /N, r = 0, and = Z q . The Lagrangian 
multipliers are again obtained from the constraints (1321) . 

It is worth commenting that if the leading In N\ term is not brought inside the summation 
in (|23|) , but discarded - the approach of all previous workers - the resulting distribution p* 
contains an additional dependence on iV -1 , which cancels out when forming the partition 
function Z" = Ne x °. It therefore has no effect on traditional statistical mechanics. The 
distinction is, however, important in the development of exact (finite- N) statistical mechan- 
ics, which does not use the Stirling approximation 
natural constraint (fl9|) . the rij form of a problem contains knowledge of N, and so it is 
unnecessary to include iV as a separate parameter. In contrast, a problem specified in its Pi 
form does not contain - of itself - any information about N; if needed, this must be specified 
separately. 

From the foregoing it is clear that the "most probable" probability distribution for a 
multinomial system, subject to arbitrary moment constraints, can be obtained without mak- 
ing use of an entropy or cross-entropy function. One can instead analyse a probabilis- 
tic system directly using first combinatorial principles. This aspect of entropy theory is 
not cle arly spelt out in the information theory literature, with only a few exceptions [e.g. 



42 



431 ] . Furthermore, by virtue of the 



10 



11 



15 



18 



90 



9l|. The direct combinatorial approach is extended further in §111 B\ to 



encompass systems not of multinomial character. 



3. Definition of the Cross-Entropy (Directed Divergence) and Entropy 

Where do the cross-entropy and entropy functions come into the above analyses? Clearly, 
they are merely convenient mathematical tools to enable construction of the Lagrangian 
equation in pi form ( |33l) . In fact we can define the cross-entropy as "that function which, 
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when inserted into the Lagrangian in place of InP, and the extremum of the Lagrangian 
is obtained, yields the most probable distribution of the system". The entropy may be 
similarly defined as "that function which, when inserted into the Lagrangian in place of 
lnP u (or InW), and the extremum of the Lagrangian is obtained, yields the most probable 
distribution of the system." 

Consider In P, expressed in pi form: 

s 

InP = GMnAH - H(Pi N V-} +PiN In qi ) (36) 

i=l 



whence from the Stirling approximation (1261) 911 ] : 



lnP^ -N Vftln— = -ND (37) 



V 

Pi m 

i=l 

Thus the cross-entropy or directed divergence D ((Sj) is simply the negative of the logarithm 
of the governing probability distribution, expressed per number of entities present (91 ]. 
Maximizing In P for a multinomial system subject to the Stirling limits is therefore equivalent 
to maximizing —D, or minimizing D, subject to the constraints on the system. (It does not 
matter whether we adopt a positive function, whose minimum yields the most probable 
realization, or its negative, whose maximum also yields this realization. By convention, the 
cross-entropy is taken here as a positive function to be minimized, although this choice is 
arbitrary. 



Similarly if we consider lnP u , from (13TT) and fl37|) the Stirling form is |3j, ]l0|, l9lj : 



lnP u « -N^pilnspi = -N\ns + NH (38) 
i=i 

This is proportional to the Shannon entropy (00), shifted by a constant. Maximizing lnP u 
subject to the Stirling limits and constraints is therefore equivalent to maximizing H, subject 
to the same constraints [91]. Indeed, from (jl8|) : 



In W ~ —N pi In pi = NH (39) 
i=i 

This definition of entropy for a multinomial system accords with the probabilistic expressions 
of Boltzmann and Shannon (Tj0) . 

It is therefore seen that the Kullback-Leibler cross-entropy and Shannon entropy functions 
are simplified forms of the logarithm of the multinomial distribution ffTB"]) . expressed per 
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unit entity. The MinXEnt and MaxEnt principles therefore provide simplified methods to 
determine the most probable realization of a multinomial system, subject to its constraints 



(succinctly termed "MaxProb" by Grendar and Grendar [76] ). The cross-entropy is the 
more generic of the two functions, in that it contains the prior probabilities g^. 

Of the three theoretical roots of the entropy and cross-entropy functions, the combinato- 
rial approach is therefore the most intellectually satisfying in that it provides a direct answer 
to the question: what is entropy? There is no circular argument: entropy and cross-entropy 
are firmly based on simple combinatorial principles. In consequence, there is no need to 
imbue either the MinXEnt or MaxEnt principles, or the cross-entropy or entropy functions 
themselves, with the kind of mystique with which they have been associated for well over a 
century. There is no mystery at all. In later sections, the foregoing analysis is generalized 
to any probabilistic system, irrespective of whether it is of multinomial character. 

4- Equivalence of Reference States 

It is necessary to be extremely careful about the definitions of the cross-entropy and 
entropy functions, given in §11 C 31 To this end, note that obtaining the extremum of the 
Lagrangian ( (|2^|) or (|33|) ) necessitates extremization, whether it contains InP or its substi- 
tute, — D (or whether InW or H, if q = u). The relationship between these quantities is 
therefore: 

d(-I>(p|q)) = id(lnP) (40) 

(In the present analysis, in the Lagrangian can be multiplied by any arbitrary 

positive constant K, and still give the same distribution, and so we could relax (T40"]) further 
by extremizing the scaled negative cross-entropy —KD. This explains why we can use the 
scaled entropy S = kH throughout thermodynamics, without affecting any calculations.) 
Correspondence between the ith terms of D and InP gives: 

d Id 
- —Di(pi\q^dpi = — — lnPjdpj i = l,...,s (41) 

where 

s s 

D(p\q) = J2 D i(Pi\<li), hiP = ^lnP i 
i=i i=i 
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Integration with respect to pi and summation gives: 

i 5 r A In F 

-w = „Y ; Jto**o» = -fr +0 (42) 

where C is a constant of integration. In consequence, the multinomial cross-entropy (jSJ) and 
entropy (j3J) could have been given respectively as (or any multiple of): 

D(p|q) = C + J> In * = ( dPi + Pi In ^) (43) 
i=i qi i=i V 

fl-(p) = -C - s ^p l \np l = - s ^(Cp l +p i \np i ) (44) 
t=i j=i 

However, the axiomatic definitions of these functions require that they obey the decisivity 
property ( §11 Bp . i.e. D = H = when {pi — 1, i — j; — 0, i ^ j}, from which (7 = 0, 
producing the recognized forms of the above functions ((151), (fill). This causes the iVlns 
term to be dropped from the definition of H (|38p . Note, however, that the choice of C 
has no impact on the application of D or H to determine the most probable realization. 
(In other words, as is recognized throughout science and engineering, all zero reference or 
datum positions for the cross-entropy and entropy - and hence for information and energy - 
are mathematically equivalent.) 



5. Ensemble Theory and Multicomponent Systems 

In its application to thermodynamics, one aspect of statistical mechanics has caused neec 



less conceptual difficulty: the use of ensembles to represent particular types of systems 112]. 
Most common are the microcanonical ensemble, representing a closed system of fixed en- 
ergy; the canonical ensemble, a closed system of fixed temperature; and the grand canonical 
ensemble, an open system of fixed temperature and mean composition. From the foregoing 
discussion, it is evident that an ensemble is simply the set of all possible realizations - each 
weighted by its number of permutations (or for unequal qi, by the probability of each real- 
ization) - consistent with a particular system specification; i.e. consistent with a specified 
governing probability distribution P, total number of entities iV (or numbers of entities of 
different types), number of states s, and specified constraints (f r ) or their equivalent La- 
grangian multipliers X r ,r = 1,...,R. An ensemble is therefore a mental construct, which 
does not require a physical manifestation. 
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As an example, consider a closed physical system in which the entities fluctuate between 
states (the elemental chaos of Planck 5|), such as gas molecules in a container. Such a system 
will migrate from one realization to another, and thence between different members of its 
ensemble (it will describe a trajectory in - for example - energetic, geometric or phase space). 
However, there is no need to require that the system must access every realization within 
a particular time frame, nor even that it should come arbitrary close to every realization; 
the only requirement of probability theory is that each realization included in the ensemble 
be realizable, to the extent given by its assigned probability. As every gambler or insurance 
broker will testify, probabilities are not certainties. Unfortunately, a great deal of erroneous 
reasoning has been put forth on this topic, which still clouds our present-day understanding. 

In contrast, consider a "multiple selection" system as defined in §11 U 1[ such as a set 
of throws of a coin or rolls of a die. In this case, the ensemble can only ever be a mental 
construct, representing the set of all possible outcomes. Once the "die is cast", the ensemble 
ceases to have any meaning, except as a reminder of what "might have been" . 

The microcanonical and canonical ensembles are both based on the multinomial distri- 
bution ( fT6l) . with different interpretations. In the (generalized) microcanonical ensemble, N 
represents the total number of non-interacting particles, each of which is deemed to possess 
its own "private" functions / r j- The constraints (f r ) can therefore be considered constant. In 
contrast, in the (generalized) canonical ensemble, N is now the number of separate systems 



(this is more clearly denoted N [89(]), each of which contains a constant number of particles, 
all subject to baths of constant X r ,r = 1, ...,R. By this device, the canonical ensemble can 
be used to examine systems containing interacting particles 5 or other coupling effects, thus 
in which the (hen c e th e (/ r ) ) can be functions of the realization, even though the A r 



are fixed [see |83|, ll 121 . 1 1 1 3k Ill4l ] . In other words, the canonical ensemble represents "the 
set of realizations of the set of realizations of interacting particles." This superset cannot 
readily be reduced to the lower (microcanonical) set unless the particles are non-interacting. 
Despite this distinction, by the use of baths of "generalized heat" (see §11 C 61) . the canonical 



ensemble is analysed by the same mathematical treatment as the microcanonical 3,183]. 



The generalized grand canonical ensemble is normally taken to consist of N separate 



5 The precise definition of "interacting" remains open. Some workers prefer to qualify this statement, by 
considering only "weakly interacting" particles [e.g.0,[8^| or those without "long-range interactions" [e.g. 
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systems, in which there are ^{tv;},?, systems containing iVj particles each of the /th type in 
the zth state, for I = 1, ...,L, where L is the number of independent species. (For reactive 
species, it is necessary to d efine a minimum set of L species, from which all other species 



can be formed by reaction 



115j .) Since the system is open, each TV; is permitted to vary 



between zero and (effectively) infinity. Expressed in terms of P rather than 
distribution is generally assumed to be "multiply multinomial" [c.f. |84|, [86 



;he governing 



87 



p G c=Nin nn 

JVi=0 N L =0 8=1 



q {Ni},i 



n {N[},i 



(45) 



where qwi},i is the prior probability of a system which contains Ni particles of each /th type 
in the ith state. This is normally subject to natural, moment and mean number of each 
type of entity constraints: 



oo s 



CO : 



Cr : 



CI : 



Y - Y J2 n WU 

Ni=0 N L =0 8=1 



N 



Y ■■■ Y Y n w^ n = N ^ ' 

Ar 1= o N L =0 i=l 
oo oo s 

E - EE^ = N ^ 

Ar 1= N L =0 i=l 



1,...,R 



l = l,...,L 



(46) 
(47) 
(48) 



The combinatorial method (( f42l) and §11 C 21) gives the Stirling-approximate cross-entropy 
and equilibrium distribution: 

1 TTT) 00 OO S 

InP ^ x - x - , P{N t },i 



-D GC 
1 



N 



Y - Y Ypw}> iln 



N 1= N L =0 i=l 
R L 



P{N t },: 



q {Nl} ^exp 



q{N t },i 

i = l,...,s 



r=l 



1=1 



with 



OO s 



(49) 
(50) 

(51) 

r=l 1=1 J 

where P{N l },i = n {7Vi},j/N; A r and v\ are Lagrangian multipliers; and E q is the generalized 
grand partition function. The entropy forms follow. However, the cross-entropy will only be 
of Kullback-Leibler form if the governing distribution is multinomial (j4"5l) . If P is of some 
other form, for example the product of independent distributions (extending [33!]): 



R L 

- Y-YY ex p ( - Y - Y uiNt 

7V 1= N L =0 i=l 



L 00 



p G c'=n p i = n n 



S n N, } 



1=0 



1=0 Ni=0 8=1 



Y Y nN i> 

N l= 8=1 



N, 



(52) 
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or if we possess some other knowledge (such as of iVj), then clearly the resulting multicom- 
ponent cross-entropy and entropy functions and the equilibrium distribution could be quite 
different. It is insufficient to simply assert (EES]) or (j4"9"l) ; its adoption must be based on sound 
reasoning, and ultimately, be demonstrated by successful predictions. 



6. "Jaynes Relations" and Generalized Free Energy Function 



It is now possib 
r f 

by Jaynes [e.g. ]7, 



e to re-examine t he g eneric structure of statistical mechanics developed 



~a y, q, 



11 



12 



13 



[14] . [15I ]. in light of the combinatorial approach. 



The generic structure has the powerful advantage of being applicable to any multinomial 
system, irrespective of the physical nature of the constraints, and is therefore not limited 
to energetic systems. It can however be even further extended. The following discussion is 
a synthesis and extension of previous treatments, building up to the development of major 
new concepts including a generalized Clausius inequality, a generalized free energy function, 
and a generalized Gibbs-Duhem relation and phase rule. Throughout the following (except 
where specified), Ao = hiZ q is assumed to be a function of each A r ; the A r are mutually 
independent; each f ri is independent of A r ; and each (/ r ) is a function of A r but not of the 
other multipliers A m , m / r. 



From p* fr27|) - fT28|) and the moment constraints (1201) it can be shown that [7, 9 

d\ 



14|: 



OK 



(fr) 



(53) 



The variance and covariances of /„: , necessarily in the vicinity of equilibrium, are obtained 
by further differentiation 0, E3, LLO, Q: 



<9 2 A 
3X1 



var 



[fr) = (A 2 ) - (frf 



d(fr) 



d 2 X 



COv(/ m , f r ) = (f r f m ) ~ (f r ) (f m ) 



d(fr) 



(54) 
(55) 



r\\ r\\ * \J Ilk"! J r / \ J r J III I \J T / \J III / r\\ 

U \ m O A r U \ Ti 

where each f ri is independent of each A m . From (|55l) . d 2 Xo/dX m dX r = d 2 Xo/dX r dX m , whence 
the coupling coefficients are equal: 

d(fr) d(f m ) 



dX m dX r 

Eq. (153]) is a subset of a more general result [151 ] : 

C0V(#, f r ) = (gf r ) - (g) (f r ) 



dig) 

dX r 



(56) 



(57) 
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where 



{gi} is any function of the states i = 1, s, in which each gi is independent of 



Using the Cauchy-Schwartz inequality (a 2 ) (b 2 ) — (ab) 2 > 116 ] with a = f r , b = 1 gives 
var(/ r ) = —d(f r ) /dX r > 0, whence d (f r ) /dX r < [13 ]. Accordingly, A r decreases mono- 
tonically with increasing (f r ). No equivalent relation is available for the mixed derivatives 
9 (f r ) /d\ m . Using the arguments of Kapur and Kesevan [14j, §2.4.2; 4.3.2], we find that Ao 
is a convex function of A r , r = 1, R. 

It is also possible to consider A and each f ri (hence also (/ r )) to be functions of parameters 
a v , v — 1, V. By differentiation of the partition function ( 129|) 7,[h3, Q, or more directly 
by rearrangement of p* ( (|2T|) - (|2^1) ) and differentiation: 

R 



dXo 



r=l 



dfr 



V 



(58) 



Alternatively, differentiation of (151)1) with respect to any continuous function a v yields (nec- 
essarily in the vicinity of equilibrium, e.g. for a shifting equilibrium position): 

j_ (dim = _b_ fd(uy\ 

dX m \ da v J dX r \ da v J 
t = time is a statement of Onsager's 



Eq. (|59|) with a v = t = time is a statement of Onsager's [1171 . Ill8l | reciprocal relations. 
Various other higher derivative equations in A r and/or a v are given by Jaynes [ijj]. 

Similarly, considering Ao and A r to be functions of j3j, j = 1,..., J; or Ao alone as a 
function of N, n* or p*, from fl27j) -(|2^D: 



dXo 

dXo 
dN 
dXo 

dn* 

dXo 
dp* 



r=l J 



J 





1 

n* 
1 

Pi 



1 J 



dX 



dn* 

dXo 
dp* 



s 

N 



(60) 

(61) 
(62) 

(63) 



From (1BT]) . Ao (and thus Z q ) is independent of iV in the Stirling limit iV — > oo. From (1621) . 
(dXo/dn*) — > in the Stirling limit n* — > oo, hence Ao is independent of the mean degree of 
filling of each state. 

Using p* ((|271)-(12"81)). the constraints ((pil-pOl) or (l3"Tl)- (pl) ). the definitions of H, D and 
P ((|4|)-(l5|)- (|42|) ) and the multiplier relations ( fl53|) ). the minimum cross-entropy or maximum 
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entropy position is obtained as [c.f. |7|, l9l- Il4|: 



R 



D* = H* = X + J2 X r (fr) = ^Z q - ^ A, 



r=l 



r=l 



din Z q 



(64) 



with probability: 



F* = Aexp{-ND*) (65) 
where A is a normalising constant (with P* < 1), and we recall that H* is obtained from 



lnP u by dropping the Ins term (or directly from InW) ( (j38|) - (l39|) ). Equation (IMj) is one of 
the most important equations in equilibrium statistical mechanics - for example giving the 
thermodynamic entropy and thence all thermodynamic fun ctions in terms of the applicable 
partition function - whilst (165]) encompasses Einstein's f 19| definition of entropy. Note that 
the MinXEnt and MaxEnt positions are of the same form, although q is implicit within Ao 
in D*. By successive differentiation of (61E)) with respect to the moments - taking Ao to be 
independent of (f r ) - gives [c 



f. QH 



14 



15|: 



3D* 



OH* 



d 2 D* 



0(fr) d(f r ) 

d 2 H* 



Xr 



dX r 



dX r 



(66) 
(67) 



d(f m )d(f r ) d(f m )d(f r ) d(f m ) d(f r ) 

whilst differentiation with respect to A r - n ow co nsidering (f r ) to be a function of A m , Vm 
and use of ( 1561) gives the Euler relation [c.f. Il20t ] : 

3D* OH* ^ d(f m ) W 



dX r 



dX r 



dX r 



d(fr) 

'' OXm 



(68) 



where M and R are numerically equal. From (|66|) . using the same arguments as Kapur & 
Kesevan §2.4.4; 4.3.2], we see that D* (or H*) is a convex (concave) function of the 
(/r)'s. A multinomial system subject to the Stirling approximation therefore has a single, 
unique equilibrium position with respect to its moment constraints. 

The variation in D* or H* due to variations in X , X r and (f r ) (and also N) is [c.f. 



H, 0, 0. 



15|: 



R 



R 



dD* = dH* = X A d (fr) - (dfr)) = A ^ 



(69) 



r=l 



r=l 



where we can interpret d(f r ) = dll r , (df r ) = J2i=iPidfri = dW r and d (f r ) — (df r ) = 
J2i=i fndpi = dQ r respectively as changes in the rth type of "energy" , "generalized work" 
on the system and "generalized heat" delivered to the system, whence (as defined here) 
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dU r = dQ r + dW r . Note that in the above derivation, the variations in A r cancel out [lOl. Il5|. 
hence (1691) encompasses conditions of either constant or variable A r . Equation (169]) is a 
superset of the Clausius relation (pQ), and so for each type of "generalized heat" there exists 
a conjugate integrating factor A r . As with the Clausius relation, the A r are properties of the 
system of interest (i.e. the one into which positive generalized heat is delivered). 

Equation (|69|) applies to a reversible process, i.e. to an incremental change in the equi- 
librium position. If we also include spontaneous irreversible processes (involving a system 
not necessarily at equilibrium), for which the cross-entropy can decrease (or entropy can 
increase) without generalized heat input, we see that: 

R 



dD = dH>J2 KdQr (70) 



r=l 



This is a superset of the C 



manner of Gibbs 



ausius inequality (J2J). Equation ( !70|) can be rearranged, in the 
115] , to give the differential form of a generic dimensionless free 
energy function <3>, here termed the free information 6 : 

\ dD+Z \ r dQ r I 
rf$= -i <0 (71) 

I - dH + E X rdQr J 

(whence <i$* = at a fixed equilibrium position), where the upper form incorporates the 
prior probabilities q. Now from 

R R 

- dD* = dH* =d\ + J2 dX r (fr) + Xrd ifr) ( 72 ) 

r=l r=l 

so if we set dD = dD* + dD irrev and dH = dH* + dH irrev (with dD irrev < and dH irrev > 0), 
where superscript irrev denotes the irreversible component, then from (|7TT) -(l72l): 

{r r \ 

-d\ - £ d\ r (f r ) + dD irrev - E X T dW T 
r t r t ><0 (73) 

-dA - E dX r (f r ) - dH irrev - E X r dW r 
r=l r=l ) 

If - and only if - there is no change in A r (i.e. no change in any contacting bath; see also 
(|77|) below), no reversible generalized work on the system (apart from that already included 



This is quite distinct from the "free physical information" of Frieden 94 1 . 
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in the constraints) and no irreversible process, then: 

d$* = -d\ = -d\nZ q (74) 

where Z q is the applicable partition function ( (|29j) or (1301) ). Alternatively, from (1731) . if there 
is no change in Ao or A r and no irreversible process: 

R 

d$ = - X r dW r < (75) 

r=l 

$ therefore indicates the maximum available weighted generalized work per entity which 
can be obtained from a system. 

Integration of (I7Tj) gives the state function: 



$ = (76) 




where Q r = J dQ r = J d (f r ) — J dW r defines each absolute generalized heat 7 . Comparing 
its differential with (ITT]) gives: 

R 

QrdXr = (77) 



r=l 



This is a superset of the Gibbs-Duhem equation 



115l | . For a system containing se para te 



coexistent phases, or bodies which differ in composition or state (as defined by Gibbs 115| ). 



there will be one such equation for each phase. For L independent constituents, r = R 



L other constraints (not including the L constituents) and p phases, ( 1771) thus vie 
generalized Gibbs' phase rule for the number of degrees of freedom of a system [c.f.y, 



85 



ds a 



115) : 



f = L + t-p = R-p (78) 

In other words, the system will be fully determined by R — p independent parameters, from 
the set of R constraints or (more commonly) their corresponding Lagrangian multipliers. 

Equations (|M|) . (fTT|) and (IT3"|) - (1T8"|) form the basis of present-day thermodynamics. For 
energetic systems, <i$ is normally divided by the energetic multiplier Ai = 1/kT; e.g. for 



7 In thermodynamic systems, this is generally approximated as Q r « (f r )> i- e - assuming each generalized 
work term is zero, except for the energy constraint, where the actual heat Q = J dQ = J TdS = TS at 
constant T is used. 
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an energetic system which can exchange heat with its surroundings, but not work or mass, 
at constant volume, dQ\ = dU, dS = kdH, dA = kTdQ = dU — TdS < and dA* = 
—kTdlnZ, where U is the mean internal energy per entity, A is the Helmholtz free energy 
per entity and Z is the micro canonical or canonical partition function 8 . For a grand 
canonical system with L independent constituents which can exchange heat and mass with 
its surroundings, but not work except for PT^-work, at constant pressure, dQi = dU, X\ = 
1/kT, dQ2 = dV, A2 = P/kT, dQ2+i = dmi, X2+1 = —fii/kT = — lncfy, dG = kTd<& = 
dU - TdS + PdV - J^i^idmi < 0, dG* = -kTdhxE and / = L + 2 - p, where P is 
pressure, V is mean volume per entity, \i\ is the chemical potential and oti is the "absolute" 
(unsealed) chemical activity of the Ith constituent, mi is the mean number of entities of 
Ith type per entity, G is the Gibbs free energy per entity and H is the grand canonical 
partition function. The essergy Y = kT § = E — T S + PqV — Hiomi is a scaled $ of 
a system with total internal energy E, in c ontac t with a bath of reference temperature To> 
pressure Pq and chemical potentials {/izo} [121] . Essergy is thus an extended free energy 
calculated with reference to the bath (e.g. the external environment), not to the system. 
The exergy X = Y — Yq is the difference between the essergy of a system (by early authors, 
with the chem ical potential terms omit t ed) , and of th e same system in equilibrium with 
the bath [e.g. 



121 



122, 



123 



124 



125 



126 



1271 . 1 1281 ] . Exergy therefore represents the 



maximum work deliverable to the environment , by allowing a system to reach equilibrium 



with that environment. The statistical extropy 1291 . Il30l . Il3l| is a modified free information 



defined with respect to the bath - with all generalized work terms set to zero (i.e. Q r ~ 
(f r )) - less the modified free information at equilibrium. Exergy forms the nucleus of the 
interrelated fields of t hermoeconomic s and exergo-economics for resource management and 



process optimization 



127 



132 



133], whilst both exergy and extropy have been used as 
measures of environmental impact, i.e. as quantitative t ools within and /or comp lement ary 



to the framework of environmental life cycle assessment 128 



129 



130 



134 



135] . 



Notwithstanding the historical development of this field, it must be emphasized that the 
use of $ is not restricted to thermodynamic, industrial or environmental systems. Just 



The extensive thermodynamic variables (e.g. U, S, V, mi, A, G) are all mean quantities, expressed in rele- 
vant units per entity. In a microcanonical ensemble, they represent mean values per particle. The total 
values are calculated by multiplication by N (the form of ([71]) remains the same). In a canonical ensemble, 
each extensive variable represents the "ensemble mean" or "mean of the total values" . 
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as with the information entropy, we can define the free information of any multinomial 
system - for example in communications, transport, urban planning, biology, geography, 
social science, politics, economics, linguistics, image analysis or any other field - and use 
it to examine its (probabilistic) stability. The entire armoury of state functions, cyclic 
integrals, efficiency ratios, Gibbs-Duhem and phase relations, Maxwell-like relations and 
Jaynes relations - currently considered the exclusive domain of thermodynamics - can then 
be brought to bear to the analysis of such systems. 



7. "Fluctuations" and Entropy Concentration Theorem 



Although the MinXEnt or MaxEnt distribution is the "most probable" one, it cannot be 
a priori assumed to be the exclusive outcome. The sharpness of the predicted dis tribu tion 
has histo ricall y been examined by two methods: the fluctuation criterion of Gi bbs 1 121 ] and 



Einstein 



1191 ] . and the entro py c oncentration t heor em of Jaynes 



foreshadowed by Boltzmann 
behaviour of the distribution 
mathematical limit theorems 



1371 ] and Einstein 







12 



15 



1361 ] . in part 



1381 ] . The detailed asymptotic convergence 



orms t he su bject of large deviations theory, based on various 



58 



80 



139], and will not be examined further here. 
The first method examines the coefficient of variation 5 of each constraining variable (or 
its square), co mmonly t ermed its "fluctuation" 9 . For a microcanonical system, this can be 



written as [c.f. [UJ, Ilia ]: 



6{Nf r ) 



(79) 



Wr> (Nfr) 

where we are careful with notation to consider the variability about the total extensive 
quantity (Nf r ) for a system of N entities, not the variability of the fixed quantity per entity 
(f r ). (Of course, 5 does not capture the full picture of the distribution of N{f ri }, e.g. the 
skewness, kurtosis, etc, for which higher order mom ents must be considered.) The criterion 



for sharpness is normally stated as 5 <C 1 

S(Nfr) = 



10 



1191 ]. From (JMD and (1791 : 




1 d(f r ) 
(frf OK 



(80) 



9 The term "fluctuation" is unfortunate, since it implies rapid change about the mean, which has little to 
do with the equilibrium position but depends on the system dynamics. S(Nf r ) is simply a measure of the 
"variability" or "spread" of the equilibrium filling of N{f ri }. 
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The term inside the second square root is positive, and in many cases of order unity, where- 
upon 5(Nf r ) « N~ l l 2 — > in the Stirling limit N — > oo. For example, for a microcanonical 
system with f u = e i; (/j) = (e) = {7, Ai = 1/kT, containing an ideal monatomic non- 



interacting gas with U = 7;kT and C v = dU/dT ■ 
capacity per entity, we obtain 5(Ne) = (fiV) -1 / 2 ps N^ 1 / 2 — * [e.g. 



fc, where C„ is the isovolumetric 



heat 



83, 



85 



86 



88| 



88( |) it applies 



10 . Although this result is not general (e.g. in the vicinity of phase changes 
to many physical phenomena, producing what is widely regarded as the overwhelming pre- 
cision of thermodynamics. If valid, the "jV~ 1//2 rule" applies only as iV — > oo; at very small 
N, a second effect must also be considered. 

For the canonical and other ensembles, the variability of the (superset) {f r i} within each 
ensemble member is examined by (see above references): 



S(fr) ^ 

whence from (153]) - (lMj) and (171) : 

1 



Vvar(/ r ) _ V [(fr ) - (fr) 



(fr) 



(fr) 



(81) 



d(fr) 



ld 2 \r 



(fr) 



d\ r (fr) V 9X 2 (f r ) 



(82) 



Whether or not this vanishes as N — > oo depends on the physical variable r and the impor- 



tance of interactions 



32 



33 



84 



1031 . c.f. previous footnote]. The variability of {f r i} for the 



total ensemble can be examined using 5(Nf r ), where N is the number of ensemble members, 
giving a relation analogous to (IHUj) . It is commonly asserted that N — ► oo (e.g. [831]). a rather 
questionable assumption. If correct, the total ensemble will be heavily concentrated at its 
ensemble mean s ( f r ) , V r. 



Jaynes' 



11 



12 



136| entropy concentration theorem considers the relative importance of 
the equilibrium probability distribution p* = {p*} and some other distribution p' = {p'j}. 



(83) 



From ( 1371) or ( |65l) . the ratio of the probability of occurrence of p* to that of p' is: 

P* 

F = exp[iv(-zr + zy)] 

where P*, P' are the governing probability distributions and D*, D' are the cross-entropies 
corresponding respectively to p* and p'. This was originally formulated as the ratio of the 



10 



All the listed authors consider 5(E) for a canonical ensemble, where (E) is the "mean of the total 
energies , but then take (E) = N (e) = ^NkT for N non-interacting particles - thus assuming the system 
is microcanonical - giving the same result. 
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number of ways in which p* and p' can be realized Uj, Il38| : 

exp[N(H* — H')\ 



W 



(84) 



where W*, W are the w eight s and H*, H' are the entropies corresponding to p* and p'. As 



shown by Jaynes [111, [12j, |136| , for N — > 1000 even a small difference in H gives an enormous 
ratio, revealing the combinatorial dominance of the maximum entropy position. 

Assuming p*, p' satisfy the constraints ( (I3~1~I) - (I3"2)) ). and taking the Stirling limits N — > oo 
and rii — > oo, an analysis similar to Kapur & Kesavan Q, §2.4.6] yields: 



D* + D' = H*-H' = i" p' In ( 



(85) 



i.e. simply the directed divergence of p' from p*, from which q vanish (being incorporated 
into p*). Eqs. ([g3>(IM]) then give: 



^ = ^- = exp iVVKln P 



i=l 



Pi 



Pi 



(86) 



If we now put p[ = p*(l + £j), take a series expansion of Inp^ abou t = 0, and discard all 



polynomial terms higher than e?, it is show n by Kapur & Kesavan 
different derivation is given by Jaynes |l36l |): 



14 



j2.4.7] that (a quite 



- D* + D' = H* - H' w 



2^ 



Pi 



*\2 



2A^^ 



— n, 



71; 



1 

2iV 



(87) 



i=l J 1 i=l 

where = p^iV is the number of entities in state i due to p'; n* = p*N is the expected 
number of entitie s in state i; and we recognize x 2 as the chi-squared distribution of statistics 



26 



140 



141| . In other words, we can determine the "goodness of fit" of a distribution 



p' - or of some function F(p) which generates p' - to a multinomial system, by comparing 
the calculated x 2 to the table value x 2 { v i 1 ~ a )i where v = s — R — lis th e nu mber of 



degrees of freedom a nd a is th e significan ce level (upper tail or rejection area) [l36j _. 



As is well known 



141 



142 



143 



144j and dramatically illustrated by Jaynes 



15 



chap 



9], the x 2 statistic is an unreliable test for goodness of fit, being highly (and erroneously) 
sensitive to the occurrence of unlikely events. There is no need to conduct the simplification 
of (JHTJ); instead, from (|85l): 



— D* + D' = H* — H' 



' N 



(88) 
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where 77 is the correct test statistic for the goodness of fit of p' or its gene rator F(p) to a 



multinomial system, subject to the Stirling limits (77 is given by Hoel 1421 . §10.1]; and by 



Jaynes [15l . §9.11.1] in the form ifi = 10?y/ln(10), using an obscure decibel notation.) The 
calculated 77 can be compared to the "table value" 77(1/, 1 — a); alternatively, two distributions 
p' and p" can be ranked by comparing their corresponding 77' and 77". Eqs. (1861) and 



finally give: 

r = w =eMv) - m 



III. APPLICABILITY OF MULTINOMIAL STATISTICS 
A. The "Multinomial Family" 

Why have the Shannon information entropy and Kullback-Leibler cross-entropy proved to 
be of such utility, in an extremely wide range of disciplines? The answer lies in the fact that 
an extraordinarily large number of probability functions pj r .. or p(x, ...) of an observable, 
encompassing a wide range of statistical problems, can be obtained from the Stirling ap- 
proximation to the multinomial distribution as special or limiting cases. For example, in dis- 
crete statistics, the uniform, geometric, generalized geometric, power-function, Riemann zeta 
function, Poisson, binomial, negative binomial, generalized negative binomial and various 
Lagrangian distributions (and many others) have been obtained from the Shannon entropy 



subject to various constraints 



14 



20] . Similarly, in continuous statistics, the uniform, nor- 
mal (Gaussian), Laplace, generalized Cauchy, generalized logistic, generalized extreme value, 
exponential, Pareto, gamma, beta (of first or second kind), generalized Weibull, lognormal, 
Poisson, power-function and many new distributions, and various multivariate forms, can 
be obtained from the continuous form of the Shannon entropy subject to various constraints 

in 

14 . 1201 ] . Many additional distributions can be obtained from the Kullback-Leibler cross- 
entropy in discrete or continuous form, subject to various prior distributions and constraints 
[14J]. All these functions therefore constitute particular examples of multinomial statistics, 
and collectively form the multinomial family of statistical distributions. The broad applica- 
bility of the multinomial distribution, produced by the (fascinating) isomorphism of many 
probabilistic problems - such as of the "balls-in-boxes" and "multiple selection" systems 
described in §11 C II - is responsible for the wide utility of the Kullback-Leibler cross-entropy 
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and Shannon entropy functions. 



B. Non-Multinomial Statistics 

Notwithstanding the success of multinomial statistics, it is important to emphasize that a 
number of statistical functions are incompatible with the Shannon entropy and/or Kullback- 
Leibler cross-entropy, and are therefore not of multinomial character. Several of these (e.g. 



Bose-Einstein, Fermi-Dirac, Renyi, Tsa 



entropy as a limiting case 
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M 



lis and Kaniadakis entropies) reduce to the Shannon 



35 



40 



41 



9l| ; such systems may therefore be 



approximated by multinomial statistics only when these limiting conditions are attained. 
More thorough analyses of non-multinomial statistics must be deferred to later studies; 
however, their importance is here noted. 

From the preceding analysis, it is clear that the definition of entropy ([3]) promulgated 
by Boltzmann and Planck 0, \^ can be used irrespective of whether the distribution is 
of multinomial character. A more comprehensive version, in which P now represents the 
governing probability distribution of any type and not only the multinomial distribution, is 
given in (T421 . The corresponding entropy is: 

'InW 



K , lnP u 



c 



K 



a 



(90) 



N J \ N 

where C, C and K are arbitrary constants. (Note that the Boltzmann |3_j] - Planck [4j 
formula © is often misleadingly quoted as S = k In W; this is correct only if S refers to the 
total entropy of the system, not the entropy per unit entity.) Indeed, it is not necessary to 
use a logarithmic transformation; for some distributions, some other transformation function 
(f> may be more convenient, giving the generalized definitions of cross-entropy and entropy: 



-D gen {p,...\q,N,...) = K{cj>{F,...) + C) 
H gen (p, -\N, ...) = k(0(P u , ...) + C)= k(0(W, ...) + C) 



(91) 
(92) 



with the only condition on being: 



extr [0(P, ...)] = max[P, ...] 



(93) 



where again C, C and k are arbitrary, whilst allows for other parameters or prior infor- 
mation. In many cases P will be a product-like function of s local probability distributions 
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hi(pi,qi, N, ...); the appropriate choice of is the logarithm-like operator which transforms 
P neatly into a sum of terms in (p(hj(pi, qj, N, ...) ) , sim plifying its extremization 11 (for par- 



allel discussions of deformed logarithms, see 1491 . Il50| |). Similarly, it may be convenient to 



choose 4> and k which define an entropy 



80 



unction with a "nice" asymptotic limiting form, in 
81] . Clearly, the information entropy (jlj) given by 



the sense of large deviations theory 

Shannon j(J - although derived from sound axiomatic postulates, and of quite broad scope 
- is strictly valid only for multinomial systems subject to the Stirling approximation. This 
may be appropriate for communication signals of infinite length, but is surely insufficient to 
underpin the vast field of information theory in general. 

C. Further Discussion 



In his many works, Jaynes expounds the "Bayesian" or "subjective" view of probabilities, 
which represent assignments of one's belief based on the available information, and argues 
against the "freque ntist" vie w in which probabilities are interpreted strictly as frequency 



assignments 



" ircquc 

m 



151 



152] . Separately, Jaynes demonstrates the equivalence of MaxEnt 
based on the Shannon entropy, and combinatorial analysis using the multinomial weight (the 



so-called Wallis derivation) [lOl, llll. At this point, however, he considers the combinatorial 

fin 

approach to represent a frequency interpretation, stating 1151]: "the probability distribu- 
tion which maximizes the entropy is numerically identical with the frequency distribution 
which can be realized in the greatest number of ways" [his emphasis]. This identification 
of the combinatorial approach with the frequentist view is unfortunate; in fact, by applying 
MaxEnt based on the Shannon entropy, one assumes (implicitly) that the phenomenon being 
examined follows the multinomial distribution, and one uses one's prior kno wledg e to infer 
(hypothesize) the available states % (for a parallel discussion, see Bhandari 1531 ]) 12 . The 
calculated probability distribution p* is therefore valid only in the "subjective" sense (i.e. 
exists only as an inference of the observer) until verified by experiment. Even if so "verified" , 
there will always be room for doubt over its validity. 



145 



146 



147 



1481 ] using 



11 The recent derivation of the Tsallis [35| entropy by Suyari and co-workers 

a transformation of the form <f> = lii2_g(W2_g), where ln g is the g-logarithmic function and W 9 is a 
g-multinomial coefficient, provides a fascinating example of an alternative transformation function. 



12 Jaynes appears to reach essentially this viewpoint in his final work [15j, chaps. 9, 11; especially §9.5-9.6, 
11.4]. 
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Indeed, the calculated MinXEnt probability (e.g. fl28|) - fl29|) ) can be expressed in a reversed 
form of Bayes' theorem: 

p . = nm „ = fSgju) = mD Pm (94) 

^{M\i) J2P(i\I)P*(M\i,I) 

i=l 

where P is a probability, P* is a most probable (modal) probability, i is the ith distinguish- 
able outcome (datum) within a set of s such outcomes, M is the ith manifestation of the 
hypothesised model P, not necessarily of multinomial form, and I is the prior information. 
For the problems considered here, I includes the constraints, any approximation or limit 
assumptions (e.g. the Stirling approximation) and any other relevant prior knowledge; if de- 
sired, these can be itemised separately. We immediately recognise the denominator in (|94|) 
as the partition function Z q ( |29l) or its equivalent, whilst P{i\I) = P*(i\I) = qi is the prior 
probability. The generalized MinXEnt or MaxEnt methods therefore provide a method, 
in the absence of any sampling data, to "bootstrap" a sampling distribution {P*(i\M, I)} 
(we could call it the posterior pre- sampling distribution) from some hypothesis distribution 
{P*(M\i, I)} and prior distribution q. The latter two distributions are necessarily embed- 
ded within the governing distribution P, being obtainable from it by extremization of fl9Tj) 
or (E2D subject to J 13 . 

In consequence, the generalized definitions of cross-entropy and entropy g iven here ((I9TT)- 



fit seamlessly into a Bayesian inferential framework [c.f. |2l|, Il02l . Il52| . I n such ca ses, 



distribution", "Jeffrey's uninformative prior" 



154 



155l | or 



q represents a "Bayesian prior 
"Jaynes' measure distribution" 10, 111] , whilst P represents one's postulated understanding 
of the probabilistic structure of the phenomenon at hand. The broader Jaynesian program 
of maximum entropy analysis as a method of statistical inference is therefore untouched (in 
fact, enhanced) by the present analysis. 

Now considering the other bases of entropy listed in £JTJ 

(a) In the inverse modelling basis developed by Kapur, Kesevan and co-workers 

one works backwards from a hypothesized or observed probability distribution (p), 
prior distribution (q) and constraints (CO-CP), to obtain the measure of cross-entropy 
or entropy applicable to the process. Using (I9T1) or (|92|) . such inverse methods could 



13 



A quite different connection between Bayesian and combinatorial perspectives was given recently [78| . 
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then be used to determine the governing probability distribution P of the process, 
n effect , thi s approach wa s adopted in statistical mechanics by Planck 4], Einstein 



28 



29 



1561 ] and Bose 27J, in which measurements of the thermodynamic entropy 



were used to determine the statistical weight. Alternatively, one can work "sideways" 



from the observec 
constraints [c.f. 



60 



prior and governing distributions (p, q and P) to determine the 
62j. Such methods offer powerful extensions to existing theory, 



61 



yet have barely been examined in the literature. 

(b) In the game-theoretic basis founded by von Neumann and Morgenstern [63j , a decision 
tree or matrix is built up by analysis of a game between two or more players. The 
statistical structure of the game and optimal playing strategies are then determined. 
Originally devised for gambling and economics, this basis has found diverse appli- 
cation to political science, biological evolution, military strategy, counter-terrorism, 
queuing theory, operations research, system dynamics and many other fields. In many 
game-playing scenarios, there exists a game-theoretic equilibrium (in economics: "Nash 



equilibrium") at which no player can benefit by changing his/her strategy 63j,|64|; such 



equilibrium concepts have more recent 
ing theory and MaxEnt [e.g. 



65 



66 



67 



y been related to information measures, cod- 



68 



691 ] . Although underpinned by axiomatic 



arguments, this basis is quite complementary to the combinatorial scheme outlined 
here, enabling the derivation of information measures and governing distributions for 
systems of complicated statistical or dynamic structure. 

(c) In the information- geometric or statistical manifold basis, a statistical or probabilis- 
tic model is represented as a geometric structure (manifold) in some mathematical 



space o 



70 



71 



popu 



72 



ation parameters or the space 



73| . This structure can then be 



space, e.g. the space of probabilities, the 
of possible distributions for the system 
analysed geometrically, using measures of distance (metrics), shape and connectivity 
(topology), tangency and differentiability. As an example, a cross-entropy or diver- 
gence measure can be interpreted as the probabilistic distance of distribution p from 
q Q, 



17|, Q, EJ. Information-geometric arguments have recently been applied to 



alternative entropy concepts [e.g. Il57j |. but much more work is required in this field. 
Since the information-geometric basis serves as a representation of a system, rather 
than a cause, it is subsidiary to the other bases; however, it provides a valuable tool 
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for the analysis of combinatorial concepts and distributions, and their connections to 
other physical theory (e.g. general relativity). 

The analysis to this point has followed a long path, only to arrive more or less at its 
starting point: the combinatorial entropy of Boltzmann and Planck (although the idea is 
taken somewhat further than they had imagined). The fact that this discussion is still 
necessary in the 21st century reflects the great gulf between present-day statistical mechanics 
and thermodynamics - still taught much as they were 50 or even 100 years ago - and the more 
recent but surprisingly narrow field of information theory initiated by Shannon 6]. The gulf 
persists despite the efforts of Bose, Einstein, Fermi and Dirac, amongst others, in statistical 
mechanics, and of Jaynes, Tribus, Kapur, Kesavan and many others in information theory 
and maximum entropy methods. The two fields are, in fact, one. Appreciation of this fact 
(by both sides) would permit the development of a much broader discipline of "combinatorial 
information theory" than at present, applicable to many different types of problems. 

In our search for meaning in the univer se an d our existence, the sentiment of John 
Wheeler is frequently quoted: "it from bif |l58 |: i.e. the entirety of existence (it) arises 
from information-theoretic principles (bit). This statement implies a universe built up by 
observer-participancy, individual measurement by measurement. However, from the present 



analysis [see also 



42 



431 ]. each combinatorial family has a different kind of "bit", producing 



a vast array of different observational frameworks and observer-dependent (subjective?) 
realities. This can be paraphrased as "both it and bit from prob." , i.e. both existence and 
information theory arise from raw probabilistic constructs. This principle clearly underpins 
the probabilistic definition of the second law of thermodynamics, "a system tends towards 
* I — y~ an d may we.! aX p laia t* P ec uliariti es of q ual m m ee nani e 8 fl 



28 



29 



30 



31 



32 



33 



42j,|43|. 



IV. CONCLUSIONS 



The philosophical bases of the entropy and cross-entropy concepts are critically examined, 
with particular attention to the information-theoretic, axiomatic and combinatorial inter- 
pretations. It is shown that the combinatorial basis, as first promulgated by Boltzmann and 
Planck, is the most fundamental (most primitive) of these three bases. Not only does it pro- 
vide (i) a derivation of the Kullback-Leibler cross-entropy and Shannon entropy functions, 
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as simplified forms of the multinomial distribution subject to the Stirling approximation; 
the combinatorial approach also yields (ii) an explanation for the need to maximize en- 
tropy (or minimize cross-entropy) to find the most probable realization; and (iii) generalized 
definitions of entropy and cross-entropy for systems which do not satisfy the multinomial 
distribution, i.e. which fall outside the domain of the Kullback-Leibler and Shannon mea- 
sures. The information-theoretic and axiomatic bases of cross-entropy and entropy - whilst 
of tremendous importance and utility - are therefore seen as secondary viewpoints, which 
lack the breadth of the combinatorial approach. The view of Shannon, Jaynes and their 
followers - in which the Shannon entropy or Kullback-Leibler cross-entropy is taken as the 
starting point and universal tool for analysis - is not seen as incorrect, but simply incomplete. 
On the other hand, the viewpoint of many scientists - who consider statistical mechanics to 
be a branch of classical mechanics or quantum physics, rather than of statistical inference - 
is also incomplete. A more detailed understanding of the combinatorial basis will enable de- 
velopment of a powerful body of "combinatorial information theory" , as a tool for statistical 
inference in all fields. 

he generic formulation of statistical mechanics developed by Jaynes 
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20] is re-examined in light of the combinatorial approach. The analysis yields 



several new concepts including a generalized Clausius inequality, a generalized free energy 
("free information") function, a generalized Gibbs-Duhem relation and phase rule, and a 
reappraisal of fluctuation theory and Jaynes' entropy concentration theorem. The free in- 
formation concept provides a framework for the application of thermodynamic-like tools 
(e.g. state functions, cyclic integrals, efficiency ratios, Gibbs-Duhem and phase relations, 
Maxwell-like relations and Jaynes relations) for the analysis of probabilistic systems of any 
type. Finally, the combinatorial basis is shown to be embedded within a Bayesian statistical 
framework. 



Acknowledgments 

This work began in 2002, and was in part completed during sabbatical leave in 2003 at 
Clarkson University, New York; McGill University, Quebec; Rice University, Texas and Col- 
orado School of Mines, Colorado, supported by The University of New South Wales and the 
Australian-American Fulbright Foundation. The work benefited from valuable discussions 



38 



with UNSW@ADFA colleagues, with participants at the 2005 NEXT Sigma Phi conference, 
Kolymbari, Crete, Greece, and (since 2006) with Marian Grendar. 



References 



[1] R. Clausius, Poggendorfs Annalen 125 (1865) 335; English transl.: R.B. Lindsay in J. Kestin 
(ed.) The Second Law of Thermodynamics, Dowden, Hutchinson & Ross, PA (1976) 162. 

[2] R. Clausius, Die Mechanische Warmetheorie (The Mechanical Theory of Heat), F. Vieweg, 
Braunschwieg, 1876; English transl.: W.R. Browne, Macmillan & Co., London, 1879. 

[3] L. Boltzmann, Wien. Ber. 76 (1877) 373; English transl.: J. Le Roux (2002) 
http:/ / www. essi.fr /~ leroux/ 

[4] M. Planck, Annalen der Physik 4 (1901) 553. 

[5] M. Planck, Vorlesungen iiber die Theorie der Warmestrahlung (The Theory of Heat Radia- 
tion), P. Blakiston Son & Co., 1913; English transl.: M. Masius, Dover Publ., NY, 1959. 
[6] C.E. Shannon, Bell Sys. Tech. J. 27 (1948) 379; 623. 
[7] E.T. Jaynes, Phys. Rev. 106 (1957) 620. 
[8] M. Tribus, J. Appl. Mech., Trans. ASME 28 (1961) 1. 

[9] M. Tribus, Thermostatics and Thermodynamics, D. Van Nostrand Co., Princeton, NJ, 1961. 

[10] E.T. Jaynes, in K.W. Ford (ed.), Brandeis University Summer Institute, Lectures in Theo- 
retical Physics, Vol. 3, Benjamin- Cummings Publ. Co. (1963) 181; also in E.T. Jaynes, (R.D. 
Rosenkratz, ed.) Papers on Probability, Statistics and Statistical Physics, D. Reidel Publ. 
Co., Dordrecht, Holland (1983) 39. 

[11] E.T. Jaynes, IEEE Trans. Systems Science and Cybernetics SSC-4 (1968) 227. 

[12] E.T. Jaynes, in R.D. Levine, M. Tribus (eds.), The Maximum Entropy Formalism, MIT Press, 
Cambridge, MA (1978) 15; also in E.T. Jaynes, (R.D. Rosenkratz, ed.) Papers on Probability, 
Statistics and Statistical Physics, D. Reidel Publ. Co., Dordrecht, Holland (1983) 210. 

[13] J.N. Kapur, H.K. Kesevan, The Generalized Maximum Entropy Principle (with Applica- 
tions), Sandford Educational Press, Waterloo, Canada, 1987. 



39 



[14] J.N. Kapur, H.K. Kesevan, Entropy Optimization Principles with Applications, Academic 

Press, Inc., Boston, MA, 1992. 
[15] E.T. Jaynes, (G.L. Bretthorst, ed.) Probability Theory: The Logic of Science, Cambridge 

U.P., Cambridge, 2003. 
[16] S. Kullback, R.A. Leibler, Annals Math. Stat. 22 (1951) 79. 
[17] S. Kullback, Information Theory and Statistics, John Wiley, NY, 1959. 
[18] F. Snickars, J.W. Weibull, Regional Science and Urban Economics 7 (1977) 137. 
[19] M. Tribus, Rational Descriptions, Decisions and Designs, Permagon Press, NY, 1969. 
[20] J.N. Kapur, Maximum-Entropy Models in Science and Engineering, John Wiley, NY, 1989. 
[21] M. Tribus, in G.J. Erickson, C.R. Smith (eds.), Maximum-Entropy and Bayesian Methods 

in Science and Engineering, Kluwer Academic Publ., Dordrecht, 1 (1988) 31. 
[22] E.T. Jaynes, in G.J. Erickson, C.R. Smith (eds.), Maximum-Entropy and Bayesian Methods 

in Science and Engineering, Kluwer, Dordrecht, 1 (1988) 25. 
[23] J.E. Shore, R.W. Johnson, IEEE Trans. Information Theory IT-26(1) (1980) 26. 
[24] R.D. Levine, J. Phys. A 13 (1980) 91. 

[25] R.A. Fisher, Philos. Trans. Royal Soc. London A 222 (1922) 309. 
[26] R.A. Fisher, Proc. Camb. Philos. Soc. 22 (1925) 700. 
[27] S.N. Bose, Z. Phys. 26 (1924) 178. 

[28] A. Einstein, Sitzungsber. Preuss. Akad. Wiss. Phys. Math. Kl (1924) 261. 
[29] A. Einstein, Sitzungsber. Preuss. Akad. Wiss. Phys. Math. Kl (1925) 3. 
[30] E. Fermi, Z. Phys. 36 (1926) 902. 
[31] P.A.M. Dirac, Proc. Roy. Soc. 112 (1926) 661. 

[32] R.C. Tolman, The Principles of Statistical Mechanics, Oxford Univ. Press, London, 1938. 

[33] N. Davidson, Statistical Mechanics, McGraw-Hill, NY, 1962. 

[34] A. Renyi, Proc. 4th Berkeley Symp. Math. Stat, and Prob. 1 (1961) 547. 

[35] C. Tsallis, J. Stat. Phys. 52(1/2) (1988) 479. 

[36] C. Tsallis, in S. Abe, Y. Okamato (eds.), Nonextensive Statistical Mechanics and its Appli- 
cations, Springer, Berlin, (2001) 3. 
[37] B.D. Sharma, D.P. Mittal, J. Math. Sci. (Calcutta) 10 (1975) 28. 
[38] B.D. Sharma, D.P. Mittal, J. Combinat. Inform. Sys. Sci. 2 (1977) 122. 
[39] C. Beck, E.G.D. Cohen, Physica A 322 (2003) 267. 

40 



[40] G. Kaniadakis, Physica A 296(3-4) (2001) 405. 
[41] G. Kaniadakis, Phys. Rev. E 66(5) (2002) 056125. 
[42] R.K. Niven, Phys. Lett. A 342(4) (2005) 286. 
[43] R.K. Niven, Physica A 365(1) (2006) 142. 

[44] J. Aczel, Z. Daroczy, On Measures of Information and their Characterization, Academic 
Press, NY, 1975. 

[45] J. Burbea, in S. Kotz, N.L. Johnson (eds), Encyclopedia of Statistical Sciences, John Wiley, 
NY, 4 (1983) 290. 

[46] T. Papaioannou, in S. Kotz, N.L. Johnson (eds), Encyclopedia of Statistical Sciences, John 

Wiley, NY, 5 (1985) 391. 
[47] J.N. Kapur, J. Inform. Optimiz. Sci. 4(3) (1983) 207. 
[48] J.N. Kapur, Advances in Management Stud. 3(1) (1984) 1. 
[49] J.N. Kapur, Indian J. Pure Appl. Math. 17(4) (1986) 429. 

[50] M. Behara, Additive and Nonadditive Measures of Entropy, John Wiley, NY, 1990. 

[51] C. Arndt, Information Measures: Information and its Description in Science and Engineering, 
Springer Verlag, Berlin, 2001. 

[52] L. Szilard, Zeitschrift fiir Physik 53 (1929) 840; English transl.: A. Rapoport, M. Knoller 
(1964), in H.S. Leff, A.F. Rex, Maxwell's Demon: Entropy, Information, Computing, Prince- 
ton Univ. Press, NJ, (1990) 124. 

[53] N. Wiener, Cybernetics: or Control and Communication in the Animal and the Machine, 
John Wiley, NY, 1948. 

[54] L. Brillouin, J. Appl. Phys. 22(3) (1951) 334. 

[55] L. Brillouin, J. Appl. Phys. 24(9) (1953) 1152. 

[56] M. Tribus, E.C. Mclrvine, Scientific American 225 (1971) 179. 

[57] A.M. Yaglom, I.M. Yaglom, Probability and Information, D. Reidel Publishing Co., Dor- 
drecht, Netherlands, 1983. 
[58] T.M. Cover, J.A. Thomas, Elements of Information Theory, John Wiley, NY, 1991. 
[59] H.K. Kesevan, J.N. Kapur, IEEE Trans. Sys. Man Cybern. 19(5) (1989) 1042. 
[60] J.N. Kapur, G. Baciu, H.K. Kesavan, Int. J. Sys. Sci. 26(1) (1995) 1. 
[61] L. Yuan, H.K. Kesavan, IEEE Trans. Sys. Man Cybern. C 28(3) (1998) 488. 
[62] M. Srikanth, H.K. Kesavan, P.H. Roe, IEEE Trans. Sys. Man Cybern. C 30(1) (2000) 77. 

41 



[63] J. von Neumann, O. Morgenstern, The Theory of Games and Economic Behavior, Princeton 
U.P, 1944. 



[64 
[65 

[66 

[67; 

[68 
[69 
[70 
[71 
[72 
[73 

[74 

[75; 

[76 
[77; 
[78 



[79 

[80; 
[81 

[82 



J. Nash, Proc. Nat. Acad. USA 36(1) (1950) 48. 

F. Tops0e, in A. Mohammed-Djafari, G. Demoments, Maximum Entropy and Bayesian Meth- 
ods, Kluwer Academic, Dordrecht, (1993) 15. 
P. Harremoes, F. Tops0e, Entropy 3 (2001) 191. 
F. Tops0e, IEEE Trans. Info. Theory 48(8) (2002) 2368. 
F. Tops0e, Physica A 340 (2004) 11. 

PD. Griinwald, A.P. Dawid, Annals Stat. 324 (2004) 1367. 
A. Bhattacharyya, Bull. Calcutta Math. Soc. 35 (1943) 99. 
C.R. Rao, Bull. Calcutta Math. Soc. 37 (1945) 81. 

S. Amari, Differential-Geometrical Methods in Statistics, Springer- Verlag, Berlin, 1985. 

J. Burbea, in S. Kotz, N.L. Johnson (eds.), Encyclopedia of Statistical Sciences, John Wiley, 

NY, 7 (1986) 241. 



R.K. Niven, (2005) \cond/mat /0512017 vl. 

M. Grendar, M. Grendar, Information Theory in Mathematics, Balatonlelle, Hungary, July 
2000. 

M. Grendar, M. Grendar, in A. Mohammad- Djafari (ed.) Bayesian Inference and Maximum 

Entropy Methods in Science and Engineering, AIP (Melville), (2001) 83. 

M. Grendar, M. Grendar, in G. Erickson, Y. Zhai (eds.) Bayesian Inference and Maximum 

Entropy Methods in Science and Engineering, AIP (Melville), (2004) 97. 

M. Grendar, M. Grendar, version 1: in G. Erickson, Y. Zhai (eds.), Bayesian Inference and 

Maximum Entropy Methods in Science and Engineering, AIP (Melville), (2004) 490; version 



2: physics/0308005 v2. 



M. Grendar, in R. Fischer, R. Preuss, U. von Toussaint, (eds.), Bayesian Inference and 
Maximum Entropy Methods in Science and Engineering, AIP (Melville), (2004) 470. 
R.S. Ellis, Entropy, Large Deviations, and Statistical Mechanics, Springer- Verlag, NY, 1985. 
A. Dembo, O. Zeitouni, Large Deviations Techniques and Applications, Jones and Bartlett 
Publ., Boston. 

L. Brillouin, J. Appl. Phys. 22(3) (1951) 338. 

E. Schrodinger, Statistical Thermodynamics, Cambridge U.P., Cambridge, 1952. 

42 



[84] T.L. Hill, Statistical Mechanics: Principles and Selected Applications, McGraw-Hill, NY, 
1956. 

[85] H. Eyring, D. Henderson, B.J. Stover, E.M. Eyring, Statistical Mechanics and Dynamics, 

John Wiley, NY, 1964. 
[86] E.A. Desloge, Statistical Physics, Holt, Rinehart & Winston, NY, 1966. 
[87] R. Abe, Statistical Mechanics, University of Tokyo Press, Tokyo, 1975. 
[88] D.A. McQuarrie, Statistical Mechanics, Harper & Row, NY, 1976. 

[89] P.W. Atkins, Physical Chemistry, 2nd ed., Oxford Univ. Press, Oxford, 1982, chap. 20, 
appendix Al. 

[90] T.R. Jefferson, C.H. Scott, Int. J. Mineral Proc. 18 (1986) 251. 
[91] J.N. Kapur, Bull. Math. Assoc. India 21 (1989) 39. 

[92] J. Stirling, Methodus Differentialis: Sive Tractatus de Summatione et Interpolatione Se- 
rierum Infmitarum, Gul. Bowyer, London, 1730, Propositio XXVII, 135-139. 

[93] A. de Moivre, Miscellanea Analytica de Seriebus et Quadraturis, J. Tonson & J. Watts, 
Londini, c.1733. 

[94] B.R. Frieden, Science from Fisher Information, An Introduction, 2nd ed., Cambridge U.P., 
Cambridge, 2004. 

[95] R.K. Niven, Combinatorial basis and supersets of the Fisher information and auto-cross- 
entropy, in prep. 
[96] L. Brillouin, Am. Scientist 37 (1949) 554. 
[97] L. Brillouin, Am. Scientist 38 (1950) 594. 
[98] R.V.L. Hartley, Bell Sys. Tech. J. 7(3) (1928) 535. 
[99] J.F. Young, Information Theory, Butterworth, London, UK, 1971. 
[100] R.K. Niven, Physica A, 334(3-4) (2004) 444. 

[101] J. Skilling, in G.J. Erickson, C.R. Smith (eds.), Maximum-Entropy and Bayesian Methods 

in Science and Engineering, Kluwer, Dordrecht, 1 (1988) 173. 
[102] R.T. Cox, The Algebra of Probable Inference, John Hopkins Press, Baltimore, 1961. 
[103] R.H. Fowler, Statistical Mechanics, 2nd ed., Cambridge U.P., Cambridge, 1936. 
[104] A. de Moivre, Philos. Trans., 27 (1712) 213, Prob. 8; English transl.: B. McClintock, Int. 

Stat. Rev. 52(3) (1984) 229. 
[105] W. Feller, An Introduction to Probability Theory and its Applications, 2nd ed., John Wiley, 

43 



NY, 1957. 

[106] M.V. Ratnaparkhi, in S. Kotz, N.L. Johnson (eds.), Encyclopedia of Statistical Sciences, 
John Wiley, NY, 5 (1985) 659. 



[107 
[108 

[109 

[110 

[111 
[112 
[113 

[114 

[115 



[116 

[117; 

[118 
[119 

[120 
[121 

[122 
[123 
[124 



1081 ] . 



I. Buteo, Logistica, Lugduni, (1559) 312-329; as quoted by Edwards 
A.W.F. Edwards, Pascal's Arithmetical Triangle: The Story of a Mathematical Idea, John 
Hopkins University Press, Baltimore, 2002. 

Bhaskara, Lilavati, c.1150, chap IV (section VI) and chap. XIII; English transl.: H.T. Cole- 
brooke, John Murray, London, 1817. 

jri X II, Lutetiae Parisiorum, 1636, Book VII, 



1081 ] . 



"IMDMI", in M. Mersenne, Harmonicorum li; 
Prop. V, 118-119; see discussion by Edwards 
M. Massieu, Comptes Rendus 69 (1869) 858; 1057. 

J.W. Gibbs, Elementary Principles of Statistical Mechanics, Dover Publ., NY, 1902. 

A. Einstein, Annalen der Physik 9 (1902) 417; English transl.: A. Beck, P. Havas, The 

Collected Papers of Albert Einstein, Princeton Univ. Press, NJ, 2 (1989) 30. 

A. Einstein, Annalen der Physik 11 (1903) 170; English transl.: A. Beck, P. Havas, The 

Collected Papers of Albert Einstein, Princeton Univ. Press, NJ, 2 (1989) 48. 

J.W. Gibbs, Trans. Connecticut Acad. Oct. 1875-May 1876: 108; May 1877-July 1878: 343; 

Am. J. Sci. 16 (1878) 441; also in J.W. Gibbs, The Scientific Papers of J. Willard Gibbs, 

Dover Publ., NY, (1961) 55. 

D. Zwillinger, CRC Standard Mathematical Tables and Formulae, Chapman & Hall / CRC 

Press, Boca Raton, FL, 2003. 

L. Onsager, Phys. Rev. 37 (1931) 405. 

L. Onsager, Phys. Rev. 38 (1931) 2265. 

A. Einstein, Annalen der Physik 14 (1904) 354; English transl.: A. Beck, P. Havas, The 
Collected Papers of Albert Einstein, Princeton Univ. Press, NJ, 2 (1989) 68. 
A. Plastino, A.R. Plastino, Phys. Lett. A 226 (1997) 257. 

R.B. Evans, A Proof that Essergy is the Only Consistent Measure of Potential Work (for 

Chemical Substances), PhD thesis, Dartmouth College, NH, 1969 (unpub.). 

J.H. Keenan, Thermodynamics, John Wiley, NY, 1941. 

J.H. Keenan, Brit. J. Appl. Phys. 2 (1951) 183. 

Z. Rant, Forschung im Ingenieurwesen 22(1) (1956) 36. 

44 



[125] R.A. Gaggioli, Chem. Eng. Sci 17 (1962) 523. 

[126] J.E. Ahern, The Exergy Method of Energy Systems Analysis, John Wiley, NY, 1980. 
[127] E. Sciubba, Int. J. Energy Research 29 (2005) 613. 
[128] E. Sciubba, S. Ulgiati, Energy 30 (2005) 1953. 

[129] K. Martinas, Periodica Polytechnica Ser. Chem. Eng. 42(1) (1998) 69. 

[130] K. Martinas, M. Frankowicz, Periodica Polytechnica Ser. Chem. Eng. 44(1) (2000) 29. 

[131] B. Gaveau, K. Martinas, M. Moreau, J. Toth, Physica A 305 (2002) 445. 

[132] M. Tribus, R.B. Evans, A Contribution to the Theory of Thermoeconomics, UCLA Dept. of 

Engineering, Report No. 62-63, Los Angeles, 1962. 
[133] A. Valero, L. Serra, J. Uche, J. Energy Resources Technol. 128(1) (2006) 1. 
[134] R.U. Ayres, L.W. Ayres, K. Martinas, Energy 23(5) (1998) 355. 
[135] R.U. Ayres, Ecological Economics 26 (1998) 189. 

[136] E.T. Jaynes (1979), in E.T. Jaynes (R.D. Rosenkratz, ed.), Papers on Probability, Statistics 

and Statistical Physics, D. Reidel Publ. Co., Dordrecht, Holland, (1983) 315. 
[137] L. Boltzmann, Annalen der Physik 57 (1896) 773; English transl.: S.G. Brush, Kinetic 

Theory, Vol. 2: Irreversible Processes, Permagon Press, Oxford, (1966) 218. 
[138] A. Einstein, Annalen der Physik 17 (1905) 132; English transl.: A. Beck, P. Havas, The 

Collected Papers of Albert Einstein, Princeton Univ. Press, NJ, 2 (1989) 86. 
[139] S.A. Book, in S. Kotz, N.L. Johnson (eds), Encyclopedia of Statistical Sciences, John Wiley, 

NY, 4 (1983) 476. 
[140] K. Pearson, Philos. Mag., 5th Series, 50 (1900) 157. 
[141] R.A. Fisher, J. Royal Stat. Soc. 87 (1924) 442. 

[142] PC Hoel, Introduction to Mathematical Statistics, 3rd ed., John Wiley, NY, 1962. 
[143] M.E. Wise, Biometrika 50(1/2) (1963) 145. 

[144] J.N. Kapur, H.C. Saxena, Mathematical Statistics, 5th ed., S. Chand & Co., Delhi, 1969. 
[145] H. Suyari, (2004) \cond-mat/0401546\ 
[146] H. Suyari, (2004) \cond-mat/04015U\ 



[147] H. Suyari, M. Tsukada, Y. Uesaka, IEEE Int. Symp. Inform. Theory, Adelaide, Australia, 

4-9 Sept. 2005. 
[148] H. Suyari, Physica A 368(1) (2006) 63. 
[149] J. Naudts, Physica A 340 (2004) 32. 

45 



[150] G. Kaniadakis, M. Lissia, A.M. Scarfone, Physica A 340 (2004) 41. 
[151] E.T. Jaynes, Am. J. Phys. 33 (1965) 391. 

[152] E.T. Jaynes, in G.J. Erickson, C.R. Smith (eds.), Maximum-Entropy and Bayesian Methods 

in Science and Engineering, Kluwer, Dordrecht, 1 (1988) 1. 
[153] R. Bhandari, Pramana 6(3) (1976) 135. 
[154] H. Jeffreys, Proc. Royal Soc. London A 138(834) (1932) 48. 
[155] H. Jeffreys, Theory of Probability, 3rd ed., Clarendon Press, Oxford, 1961. 
[156] A. Einstein, Annalen der Physik 17 (1905) 549; English transl.: A. Beck, P. Havas, The 

Collected Papers of Albert Einstein, Princeton Univ. Press, NJ, 2 (1989) **. 
[157] C. Vignat, A. Plastino, Phys. Lett. A 343 (2005) 411. 

[158] J. A. Wheeler, in W.H. Zurek (ed.), Complexity, Entropy and the Physics of Information, 
Addison- Wesley Publ. Co., Redwood City, CA, 8 (1990) 3. 



46 



